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ABSTRACT 

Teachers/lecturers typically adapt their teaching to respond 
to students’ emotions, e.g. provide more examples when 
they think the students are confused. While getting a feel of 
the students’ emotions is easier in small settings, it is much 
more difficult in larger groups. In these larger settings tex- 
tual feedback from students could provide information about 
learning-related emotions that students experience. Predic- 
tion of emotions from text, however, is known to be a diffi- 
cult problem due to language ambiguity. While prediction 
of general emotions from text has been reported in the lit- 
erature, very little attention has been given to prediction 
of learning-related emotions. In this paper we report sev- 
eral experiments for predicting emotions related to learning 
using machine learning techniques and n-grams as features, 
and discuss their performance. The results indicate that 
some emotions can be distinguished more easily then oth- 
ers. 
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1. INTRODUCTION 

Detecting emotions is important in the learning process [4]. 
Positive emotions may increase students’ interest in learn- 
ing, increase engagement in the classroom and motivate stu- 
dents [4] . Additionaly, students who are happy generally are 
more motivated to accomplish their learning goals. 

Sentiment analysis research has grown considerably in the 
last decade, mainly due to the availability of rich text re- 
sources such as social networking sites, blogs and micro- 
blogs, and product reviews. Despite the name of this area, 
sentiment analysis is mostly focused on detection of polarity 
(negative or positive sentiment) rather than specific emo- 
tions. Thus, there is relatively little research on the predic- 


tion of specific emotions from text [2, 3], with even fewer 
reports of such research in education [9]. Moreover, from 
these studies (both within the educational field and outside 
of it), an even smaller number use machine learning to pre- 
dict emotion from text, e.g. [2, 3, 9]. 

In this paper we focus on the prediction of emotions relevant 
for learning from students’ textual feedback via Twitter in 
a classroom context using machine learning techniques. To 
investigate the prediction of the identified emotions from 
text, we experiment with several preprocessing methods, n- 
gram features, and machine learning techniques. 

2. RELATED RESEARCH 

There are four main steps to create predictive models from 
text with machine learning: preprocessing the data, select- 
ing the features, applying the machine learning techniques 
and evaluating the results. 

Preprocessing the data involves preparing the data and clean- 
ing it from unwanted elements which may negatively af- 
fect the performance of the machine learning techniques. 
Some of the general preprocessing techniques used with ba- 
sic text are: tokenization, convert text to lower or upper 
case, remove punctuation, remove numbers and, remove stop 
words [8]. 

Preprocessing Twitter data requires additional techniques 
due to the presence of emoticons, hashtags and chat lan- 
guage. Some of the Twitter-specific data preprocessing tech- 
niques from previous research [8, 11] are: removing hashtags, 
removing URLs, removing retweets, identifying emoticons, 
removing user mentions in tweets, removing Twitter special 
characters, and slang/chat language handling. 

In relation to specific emotions detection, both general pre- 
processing techniques and Twitter-related preprocessing tech- 
niques have been used, e.g. removal of stop words and stem- 
ming [3], removing URLs [5], and tokenization [5]. 

Feature selection refers to the process of selecting relevant 
features for the particular prediction problem, while elim- 
inating the features that are redundant or irrelevant. In 
prediction problems where the data is in the form of text, 
the most common features are n-grams [7]. The most com- 
monly used n-gram for emotion detection is unigrams (one 
word) [7]. In contrast, there are very few studies investi- 
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gating the use of bigrams (two words) and trigrams (three 
words) in emotion prediction. However bigrams and tri- 
grams has been used in sentiment analysis of tweets [7]. In 
this paper, we investigate the influence of these different n- 
grams and their combination on emotion detection. 

Various machine learning techniques have been used for po- 
larity and emotions prediction from text. In our experiments 
we used classifiers previously shown to work well [9]: Naive 
Bayes (NB), Multinomial Naive Bayes (MNB), Complement 
Naive Bayes (CNB), Support Vector Machines (SVM), Max- 
imum Entropy (ME), Sequential Minimal Optimization (SMO), 
and Random Forest (RF). 

Previous research on emotions related to learning indicates 
a variety of emotions experienced by learners [6]. In pre- 
vious research [1], we identified from the literature a num- 
ber of common emotions that are associated with learning: 
amused, anxiety, appreciation, awkward, bored, confusion, 
disappointed, embarrassed, engagement, enthusiasm, excite- 
ment, frustration, happy, motivated, proud, relief, satisfac- 
tion, shame and uninterested. 

3. DATA CORPUS 

The data was collected from lectures taught in English in 
Jordanian universities on different topics: calculus, English 
communication skills, database, engineering, molecular bi- 
ology, chemistry, physics, science, contemporary history of 
the world and architecture. 

Twitter was used to collect students feedback, opinions, and 
feelings about the lecture. For each tweet, they were asked to 
choose one emotion from a set of emotions provided, i.e. the 
19 emotions listed in the previous section. Although tweets 
were used the language was formal and did not include chat 
language or slang, however, they did include emoticons and 
haslitags. 

A total number of 1522 tweets were collected with their cor- 
responding emotion label. There was one label per feedback. 
Some of the emotions appeared more frequently than others. 
The most frequent emotions that were used in our research 
were: Bored (336), Amused (216), Frustration (213), Ex- 
citement (178), Enthusiasm (176), Anxiety (130), Confusion 
(73), and Engagement (67). The least frequent ones were 
discarded due to insufficient data for training and testing 
machine learning algorithms: Happy (32), Satisfaction (31), 
Appreciation (26), Embarrased (18), Dissapointed (12), Un- 
interested (4), Proud (3), Relief (3), Shame (2), Awkward 
(1), and Motivated (1). 

4. PREDICTION OF EMOTIONS FROM 
STUDENTS’ FEEDBACK 

Two different preprocessing levels were experimented with: 

(a) high preprocessing, which includes: tokenization, con- 
vert text to lower case, remove punctuation, remove num- 
bers, remove stop words, remove hashtags, remove URLs, 
remove retweets, remove user mentions in tweets, and re- 
move Twitter special characters; (b) low processing, which 
includes: tokenization, convert text to lower case, and re- 
move stop words. 


The high preprocessing was only used for one of the models 
which contained all the emotions combined, due to the low 
results that it led to in comparison with the low level of 
preprocessing for this model. Consequently, for the other 
models only the low preprocessing was experimented with. 

The negative influence of preprocessing on the performance 
of the models indicates that information that is typically 
discarded for polarity prediction has value for the identifi- 
cation of specific emotions, as for example in the case of 
punctuation [11]. 

We experimented with different n-grams, i.e. unigrams, bi- 
grams, and trigrams, and all combinations between them 
to find which n-gram or combination of n-grams leads to 
the best performance for the different models. The features 
that were experimented with are: Unigrams (UNI); Bigrams 
(BI); Trigrams (TRI); Unigrams and Bigrams combined; 
Unigrams and Trigrams combined; Bigrams and Trigrams 
combined; and Unigrams, Bigrams, and Trigrams combined. 

We used the classifiers mentioned previously in section 2 due 
to their common use in previous research. Additionally, we 
used two common kernels for SVM: radial basis (RB) and 
linear (LIN) kernel. 

We experimented with all the emotions combined and then 
subtracted, in turn, the emotion with the lowest number of 
instances. The total number of models experimented with 
was 16 models, which are: 7 emotions (All except engage- 
ment) + other (8 classes); 6 emotions (7 emotions except 
confused) + other (7 classes); 5 emotions (6 emotions ex- 
cept anxiety) + other (6 classes); 4 emotions (5 emotions 
except enthusiasm) + other (5 classes); 3 emotions (4 emo- 
tions except excitement) + other (4 classes); 2 Emotions 
(Amused, Bored) + other (3 classes); and each emotion + 
other (2 classes). 

All the models were tested using 10-fold cross-validation; the 
accuracy and the error rate were used to assess the overall 
performance of the classifiers, while the precision, recall, and 
F-score were used to assess the ability of the classifiers to 
correctly identify the specific emotion(s). 

The results indicate that the models with a single emotion 
perform better than the multi-emotion models in terms of 
accuracy, although one has to bare in mind that the baseline 
for multi-class models is lower than the baseline for 2-class 
models. 

The results show that two classifiers performed best in terms 
of accuracy: the Support Vector Machine with Radial Basis 
kernel (RB), mainly for the 2-class models, and Sequential 
Minimal Optimization (SMO), mainly for the multi-class 
models. In term of features, unigrams and trigrams were 
found to lead to the best performance for the 2-class mod- 
els, while unigrams combined with bigrams and trigrams led 
to the best performance for the multi-class models. 

Despite the fact that accuracy can be useful in predicting the 
models performance, it does not indicate how well a classi- 
fier can predict specific emotions. As the recall indicates the 
percentage of correctly identified instances for a class of in- 
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Table 1: Highest recall for each model 


Model 

Technique 

N-gram 

Accuracy 

Error 

rate 

Precision 

Recall 

F-score 

ALL Preprocessed 

ME 

UNI+BI+TRI 

0.32 

0.68 

0.34 

0.33 

0.33 

ALL W/O Preprocessing 

ME 

UNI+BI 

0.32 

0.68 

0.33 

0.32 

0.32 

7 Emotions+ other 

NB 

BI+TRI 

0.26 

0.74 

0.24 

0.25 

0.25 

6 Emotions+ other 

MNB 

UNI 

0.27 

0.73 

0.27 

0.26 

0.27 

5 Emotions+ other 

MNB 

UNI+TRI 

0.25 

0.75 

0.32 

0.32 

0.32 

4 Emotions+ other 

MNB 

BI 

0.26 

0.74 

0.29 

0.38 

0.33 

3 Emotions + other 

ME 

UNI+BI+TRI 

0.51 

0.49 

0.43 

0.36 

0.39 

2 Emotions+ other 

ME 

UNI+BI+TRI 

0.57 

0.43 

0.40 

0.51 

0.45 

Amused 

CNB 

TRI 

0.49 

0.51 

0.19 

0.70 

0.30 

Anxiety 

CNB 

TRI 

0.45 

0.55 

0.12 

0.77 

0.21 

Bored 

CNB 

TRI 

0.44 

0.56 

0.28 

0.85 

0.42 

Confused 

CNB 

TRI 

0.28 

0.72 

0.06 

0.81 

0.11 

Engagement 

CNB 

TRI 

0.24 

0.76 

0.04 

0.68 

0.08 

Enthuisiasm 

CNB 

TRI 

0.36 

0.64 

0.14 

0.76 

0.24 

Excitement 

CNB 

TRI 

0.37 

0.63 

0.15 

0.86 

0.26 

Frustration 

CNB 

TRI 

0.40 

0.60 

0.19 

0.84 

0.31 


Table 2: Best overall models for identification of specific emotions 


Model 

Technique 

N-gram 

Accuracy 

Error 

rate 

Precision 

Recall 

F-score 

Amused 

CNB 

Bi+Tri 

0.64 

0.36 

0.24 

0.62 

0.35 

Bored 

CNB 

UNI+BI+TRI 

0.71 

0.29 

0.43 

0.63 

0.51 

Excitement 

CNB 

UNI+TRI 

0.64 

0.36 

0.21 

0.64 

0.32 


terest, it can be used to assess the ability of the classifiers to 
predict emotions; in addition, precision can indicate where 
the identification problems occur. 

For most of the models with the highest accuracy, the re- 
call is extremely low or even 0% in some cases. In addition, 
precision is also low for most of the models (with a few ex- 
ceptions). For instance in the “engagement + other” model 
where the accuracy is 95% and the precision, recall, and 
F-score are (0-0.05)% for the emotion class. This indicates 
that the high accuracy is due to the correct identification 
of the “other” class rather than the correct identification of 
emotion(s). 

Table 1 displays the best experimental results when focusing 
on the recall, i.e. the correct identification of the emotion(s). 
In terms of machine learning techniques, Complement Naive 
Bayes (CNB) performs best for half of the models, which 
could be explain by the ability of this technique to compen- 
sate for uneven class sizes. In terms of features, trigrams 
led to the best performance in the 2-class models, while un- 
igrams combined with bigrams and trigrams led to the best 
performance in the multi-class models. 

The fact that the models with high recall rates have low 
accuracy and low precision values indicates that many in- 
stances of the “other” class are wrongly classified as indi- 
cating particular emotions. In other words, although the 
classifiers have a higher sensitivity for the emotion classes, 
they are not precise in distinguishing the “other” class from 
the emotion class (es). 


When looking at the overall picture and the balance of the 
evaluation metrics considered (i.e. accuracy, error rate, pre- 
cision and recall), some of the models stand out - these are 
presented in Table 2. We found that the best classifier is 
Complement Naive Bayes (CNB). When looking at the fea- 
tures, one can notice that different combinations of n-grams 
led to the best performance for different classifiers. This in- 
dicates that a combination of various n-grams instead of a 
single n-gram is useful for the prediction of specific emotions 
and should be investigated further. 

It is not surprising that the best performing models are for 
the emotions for which we had larger number of instances 
(see section 3), i.e. bored, amused and excitement. Interest- 
ingly, the models for excitement performed better that the 
ones for frustration, although there were more instances for 
frustration than for excitement. 

From previous research studies focusing on the prediction of 
emotions using machine learning techniques, only one study 
was conducted in an educational context [9]. This research 
used part-of-speech (POS) tags as features, and more specif- 
ically, they experimented with the combination of the follow- 
ing part-of-speech tags: verb, adverb, adjective and noun. 
They evaluated their models using precision, recall, and F- 
score and found that Random Forest performed better than 
the other classifiers with a weighted average F-score at 0.638. 
Similar to our research they found that the recall score was 
higher than the precision. From the emotions that we iden- 
tified as relevant for learning from previous literature, they 
only looked at anxiety, for which they obtained a precision 
value of 0.6 using a LogitBoot classifier. However, this re- 
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search was conducted on Chinese text, which has different 
characteristics and structures compared with English text. 
Moreover, the research was based on text from online chats 
and discussion groups. Furthermore, they used in their ap- 
proach an affective words base (i.e. lexicon), where each 
affective word had a number associated with its degree of 
reflection of a particular emotion. 

Outside the educational domain, there are very few studies 
that looked at the prediction of specific emotions from text 
only, which are described below. 

One study, which used unigrams and a experimented with 
a multi-class model with 5 emotions [3], found that the 
Naive Bayes and Support Vector Machine classifiers per- 
formed well, leading to an accuracy of 67%. This data, how- 
ever, is not representative for other types of text expressing 
emotions, as indicated by the low accuracy, i.e. less than 
35%, of these models on test sets with other data. Similarly 
to the research described above, they also experimented with 
lexicons for specific emotions. 

Another study which used unigrams as a feature and ma- 
chine learning looked at predicting the presence of emotion 
versus the lack of emotion [2]; they obtained a maximum 
accuracy of 74%. However, they did not discuss the perfor- 
mance in terms of identifying the presence of emotion (i.e 
recall for the emotion). They have also used lexicons with 
emotion-related words. 

However, very few studies investigated the use of other n- 
grams. Youn and Purver [10] investigated the prediction of 
emotions from the Chinese microblog service Sina Weibo; in 
their experiments they found that the models with bigrams 
and trigrams outperformed the models using unigrams. Sim- 
ilarly, our results showed that using all of the n-grams (i.e. 
unigrams, bigrams, and trigrams) combined led to the best 
identification of emotions for the multi-emotion models. Ad- 
ditionally, we found that trigrams led to the best identifica- 
tion of emotions for the 2-class models. 

While it is difficult to compare the performance of our mod- 
els with previous work given the variations in different exper- 
imental set-ups (e.g. data origin, language, choice of emo- 
tions, choice of features and the use of lexicons), one aspect 
that seems to be prevalent in previous research is the used 
of lexicons. Consequently, in out future work, we will inves- 
tigate the use of such an affective word base for education 
and its effect on the prediction models. 

5. CONCLUSIONS AND FUTURE WORK 

In this paper we conducted several experiments with the 
purpose to investigate the prediction of specific emotions re- 
lated to learning from students’ textual classroom feedback. 
We focused on several learning emotions which were found 
to be relevant from previous literature: Amused, Anxiety, 
Bored, Confusion, Engagement, Enthusiasm, Excitement, 
and Frustration. We experimented with several preprocess- 
ing and machine learning techniques, and also with different 
combinations of n-gram features. 

The models were evaluated using 10-fold cross-validation 
and using the following evaluation metrics: accuracy, er- 


ror rate, precision, recall, and F-score. The best performing 
models were obtained for three particular emotions using 
2-class models: amused, bored and excitement. The best 
classifier was Complement Naive Bayes (CNB). A combina- 
tion in n-grams led to the best performance in most models. 

In future work we will investigate the influence on prediction 
of a learning-related emotion lexicon; we will also investigate 
the relation between learning emotions and polarity. 
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